EPRENNID: An evolutionary prototype reduction based ensemble for nearest neighbor classification of imbalanced data

نویسندگان

  • Sarah Vluymans
  • Isaac Triguero
  • Chris Cornelis
  • Yvan Saeys
چکیده

Classification problems with an imbalanced class distribution have received an increased amount of attention within the machine learning community over the last decade. They are encountered in a growing number of real-world situations and pose a challenge to standard machine learning techniques. We propose a new hybrid method specifically tailored to handle class imbalance, called EPRENNID. It performs an evolutionary prototype reduction focused on providing diverse solutions to prevent the method from overfitting the training set. It also allows us to explicitly reduce the underrepresented class, which the most common preprocessing solutions handling class imbalance usually protect. As part of the experimental study, we show that the proposed prototype reduction method outperforms state-of-theart preprocessing techniques. The preprocessing step yields multiple prototype sets that are later used in an ensemble, performing a weighted voting scheme with the nearest neighbor classifier. EPRENNID is experimentally shown to significantly outperform previous proposals. & 2016 Elsevier B.V. All rights reserved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prototype reduction techniques: A comparison among different approaches

The main two drawbacks of nearest neighbor based classifiers are: high CPU costs when the number of samples in the training set is high and performance extremely sensitive to outliers. Several attempts of overcoming such drawbacks have been proposed in the pattern recognition field aimed at selecting/gen-erating an adequate subset of prototypes from the training set. The problem addressed in th...

متن کامل

Differential evolution for optimizing the positioning of prototypes in nearest neighbor classification

Nearest neighbor classification is one of the most used and well known methods in data mining. Its simplest version has several drawbacks, such as low efficiency, high storage requirements and sensitivity to noise. Data reduction techniques have been used to alleviate these shortcomings. Among them, prototype selection and generation techniques have been shown to be very effective. Positioning ...

متن کامل

An Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification

The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...

متن کامل

An Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification

The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...

متن کامل

Improved Fuzzy-Optimally Weighted Nearest Neighbor Strategy to Classify Imbalanced Data

Learning from imbalanced data is one of the burning issues of the era. Traditional classification methods exhibit degradation in their performances while dealing with imbalanced data sets due to skewed distribution of data into classes. Among various suggested solutions, instance based weighted approaches secured the space in such cases. In this paper, we are proposing a new fuzzy weighted near...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Neurocomputing

دوره 216  شماره 

صفحات  -

تاریخ انتشار 2016